allCode.jl
. It's been tested under Julia 1.9
.For more accurate benchmarks, each variable is wrapped in
ref(x) = (Ref(x))[]
and interpolated through $
. Action | Keyboard Shortcut |
---|---|
Open All Codes and Outputs in a Post | Ctrl + 🠙 |
Close All Codes and Outputs in a Post | Ctrl + 🠛 |
Close Any Popped Up Window (like this one) | Esc |
Previous Post | Ctrl + 🠘 |
Next Post | Ctrl + 🠚 |
List of Sections in a Post | Ctrl + x |
List of Keyboard Shortcuts | Ctrl + z |
allCode.jl
. It's been tested under Julia 1.9
.ref(x) = (Ref(x))[]
and interpolated through $
. We continue analyzing approaches to enhancing performance in Julia. The focus is in particular on methods that are relatively easy to implement, in the sense of not requiring modifying the code substantially.
In the previous post, we explored lazy broadcast. This technique is useful when multiple intermediate steps are needed to obtain a final result. In this post, we'll delve into another approach to improving performance: tuples and static vectors. They are helpful when we work with a collection of objects that have a fixed size and contain a small number of elements. While the exact definition of small is unspecified, a rough guideline is to consider collections with less than 100 elements. In fact, I recommend using tuples and static vectors only when their benefits are evident, which typically means collection with about 20 elements.
We'll illustrate their application through tuples, as static vectors are tuples under the hood. It's worth remarking that static arrays are more versatile than tuples, as they can accommodate matrices and mutable objects. Nevertheless, we'll treat them as the same, since these particular uses won't be part of the post.
The post's main conclusion is that tuples/static arrays can be quite powerful. But you need to be particularly careful, as their misuse can introduce type instabilities and hurt performance severely.
Let's concentrate on a concrete situation to facilitate our intuition. Suppose that a university is assessing potential students for their admission. As the university has too many candidates, they decide to perform a screening procedure. This procedure involves the candidates taking a pre-test, and then the university comparing their scores with the admission thresholds of previous years. Only those candidates above the threshold of at least one previous year will be considered.
To formalize the situation, let's create some mock data. The details for constructing the dataset are irrelevant for our purposes, and hence we'll omit them. [note] See the code uploaded if you want to review the procedure. What matters is that we'll use two variables, named pre_scores
and past_thresholds
, which are created through the function generate_variables
.
Random.seed!(123) #setting the seed for reproducibility
pre_scores, past_thresholds = generate_variables(nr_years = 20, nr_people = 10_000)
isabove(score, thresholds) = any(score .> thresholds)
areabove(score, thresholds) = isabove.(score, Ref(thresholds))
past_thresholds
captures the thresholds of previous years, which is 20 years in our example. In turn, pre_scores
represents the scores of the candidates, with 10,000 applicants assumed in our example. Likewise, the function isabove
defines the university's procedure for a single student. It checks whether the candidate is above the threshold in at least one of the 20 previous years. [note] In case you're not familiar with the function any
, it takes a vector as an argument and returns true
if at least one of the vector's elements is true
. The function areabove
extends this by applying isabove
to every candidate. The variables look as follows.
pre_scores
10000-element Vector{Float64}:
1.8267202241683234
3.161299600281294
4.829613855482266
â‹®
2.2488030844480194
2.162875169226787
past_thresholds
20-element Vector{Float64}:
5.678547557310528
9.325358309487532
5.1380234173752815
â‹®
5.682373858994395
5.285879918833311
Our goal is to compare the performance of the function isabove
based on how we handle past_thresholds
. In the baseline scenario, we consider past_thresholds
as an ordinary vector, and compare it against past_thresholds
defined as a tuple/static vector. past_thresholds
is well-suited for the use of tuples/static vectors, as it consists of 20 elements and its size will remain fixed throughtout the analysis.
Since static vectors are ultimately tuples, we exemplify the results by using the latter. The following code snippet reveals a significant increase in performance, even when we consider a relatively low number of candidates. Furthermore, it shows the simplicity of its implementation, requiring only the application of the function Tuple
. [note] The function ref
wrapping each variable is employed to prevent the compiler from optimizing the benchmark. Otherwise, the time reported would be artificially low, because the processor repeats the result rather than the operation. See here for instance.
@btime areabove(ref($pre_scores), ref($past_thresholds))
515.200 μs (20004 allocations: 943.06 KiB)
@btime areabove(ref($pre_scores), ref(Tuple($past_thresholds)))
8.733 μs (24 allocations: 6.03 KiB)
In the following, we'll demonstrate that tuples/static vectors need to be handled carefully, as their incorrect usage can easily introduce type instabilities. Keep in mind that running a type-unstable function in Julia not only reduces performance, but basically kills it.
The possibility of type instability arises because tuples/static vectors are a data type defined by both their elements' type (Float64
in our example) and the number of elements contained. This determines that a function is type stable if the compiler knows in particular the tuple's length at the execution time. However, this is not automatically achieved, because Julia chooses a function's method according to the types of its arguments, not its values.
To illustrate the problem, let's consider various approaches to converting past_thresholds
into tuples. The first one is simply the approach we used above. The key of this method not creating any issue is that the tuple is passed into areabove
as an argument of the function. However, it'd be natural to instead pass past_thresholds
as is (i.e., as a vector), and then convert it into a tuple within the function. This is covered by the codes in Tuple 2 to Tuple 5, where the problem may arise.
tuple_thresholds = Tuple(past_thresholds)
@btime areabove(ref($pre_scores), ref($tuple_thresholds)) #type stable
8.167 μs (3 allocations: 5.55 KiB)
function areabove(scores, thresholds)
tuple_thresholds = Tuple(thresholds)
isabove.(scores, Ref(tuple_thresholds))
end
@btime areabove(ref($pre_scores), ref($past_thresholds)) #type unstable
23.100 μs (28 allocations: 6.30 KiB)
function areabove(scores, thresholds)
tuple_thresholds = NTuple{length(thresholds), eltype(thresholds)}(thresholds)
isabove.(scores, Ref(tuple_thresholds))
end
@btime areabove(ref($pre_scores), ref($past_thresholds)) #type unstable
23.200 μs (11 allocations: 6.25 KiB)
function areabove(scores, thresholds)
tuple_thresholds = NTuple{20,eltype(thresholds)}(thresholds)
isabove.(scores, Ref(tuple_thresholds))
end
@btime areabove(ref($pre_scores), ref($past_thresholds)) #type stable
8.167 μs (3 allocations: 5.55 KiB)
function areabove(scores, thresholds, ::Val{N}) where {N}
tuple_thresholds = NTuple{N,eltype(thresholds)}(thresholds)
isabove.(scores, Ref(tuple_thresholds))
end
@btime areabove(ref($pre_scores), ref($past_thresholds), Val(length(ref($past_thresholds)))) #type stable
8.333 μs (3 allocations: 5.55 KiB)
Tuple 2 creates a type instability because the compiler analyzes the function, but doesn't figure out in advance the number of elements in tuple_thresholds
. Therefore, it treats the tuple as potentially comprising any number of elements. Tuple 3 reveals that the issue is not solved by explicitly indicating the length and type of its elements—Julia doesn't dispatch the method by values, but by types. This means that Julia doesn't choose the function's method by computing length(thresholds)
beforehand.
Tuple 4 shows an easy way to restore type stability, which consists in directly indicating the number of elements when creating the tuple. This also demonstrates that the problem is not due to the type of the tuple's elements: we can still use eltype(thresholds)
, as the compiler knows the type of thresholds
in the function areabove
. However, the solution is not practical, because it requires indicating the tuple's length in advance.
This leads us to Tuple 5, which is the solution generally used in practice. It uses Val
as an artifact for dispatching the method by values. Expressed in words, this entails using Val{N}
as a function's argument, where N
is the tuple's length. In this way, the compiler knows the number of elements in the tuple, and can dispatch the method accordingly. Notice that Val
is called in the function by using curly brackets (i.e., Val(length(past_thresholds))
).
If you observe the running times of Tuple 2 and 3 above, you'll notice that the type-instability created isn't a significant problem. However, this only happens because we're immediately passing the tuple to the function isabove
, and type instabilities do not arise if we use a tuple/static vector as a function's argument.
This is part of a more general technique called barrier functions. [note] See here for the official documentation's explanation. It entails the presence of an "inner" function solving the type instability of the "parent" function.
The use of barrier functions allows the user to maintain code readibility by avoiding the use of Val
. However, it's crucial to keep in mind that barrier functions may not always be a suitable solution, especially if the type instability remains unsolved before a tight loop. This scenario is demonstrated in the following code, where we simplify the illustration by considering an abstract example.
Random.seed!(123) #setting the seed for reproducibility
x = rand(10)
function example1(x)
tuple_x = Tuple(x)
for _ in 1:100_000
log.(tuple_x)
end
end
@btime example1(ref($x)) # type-unstable
13.301 ms (300011 allocations: 27.47 MiB)
Random.seed!(123) #setting the seed for reproducibility
x = rand(10)
function example2(x, ::Val{N}) where {N}
tuple_x = NTuple{N,eltype(x)}(x)
for _ in 1:100_000
log.(tuple_x)
end
end
@btime example2(ref($x), Val(length(ref($x)))) # type-stable
2.544 ms (0 allocations: 0 bytes)
Random.seed!(123) #setting the seed for reproducibility
x = rand(10)
function barrier_function(x)
for _ in 1:100_000
log.(x)
end
end
function example3(x)
tuple_x = Tuple(x)
barrier_function(tuple_x)
end
@btime example3(ref($x)) # type-unstable but minimum penalty
2.543 ms (11 allocations: 256 bytes)
We conclude this post by demonstrating how to avoid type instabilities when using static vectors. Essentially, the underlying principles resemble those for tuples, but the syntax for creating static vectors differs.
function with_SVector_barrier(scores, thresholds)
sthresholds = SVector(thresholds...)
areabove(pre_scores, sthresholds)
end
@btime with_SVector_barrier(ref($pre_scores), ref($past_thresholds)) #type unstable
20.200 μs (26 allocations: 6.38 KiB)
function with_SVector_val(scores, thresholds, ::Val{N}) where {N}
sthresholds = SVector{N, eltype(thresholds)}(thresholds)
areabove(scores, sthresholds)
end
@btime with_SVector_val(ref($pre_scores), ref($past_thresholds), Val(length(ref($past_thresholds)))) #type stable
19.500 μs (3 allocations: 5.55 KiB)
function with_SVector_macro(scores, thresholds)
sthresholds = @SVector [thresholds[i] for i in eachindex(past_thresholds)]
areabove(scores, sthresholds)
end
@btime with_SVector_macro(ref($pre_scores), ref($past_thresholds)) #type stable
19.500 μs (3 allocations: 5.55 KiB)
Notice in particular how a static vector is created in SVector 3
. This involves specifying the vector's range by using either a global variable (as in our example with past_thresholds
) or a literal value (e.g., 20)—the package StaticArrays
dictates that the range must be specified in this manner.